# Multimodal Grounding
Kosmos 2 Patch14 24 Dup Ms
MIT
Kosmos-2 is a multimodal large language model capable of integrating visual information with language understanding to achieve image-to-text conversion and visual grounding tasks.
Image-to-Text
Transformers

K
ishaangupta293
21
0
Kosmos 2 Patch14 224
MIT
Kosmos-2 is a multimodal large language model capable of understanding and generating text descriptions related to images, and establishing associations between text and image regions.
Image-to-Text
Transformers

K
microsoft
171.99k
162
Kosmos 2 Patch14 224
Kosmos-2 is a multimodal large language model capable of grounding language models to real-world visual elements, supporting various vision-language tasks.
Image-to-Text
Transformers

K
ydshieh
62
54
Featured Recommended AI Models